多智能体强化学习环境PettingZoo文档详解（一）

您所在的位置：网站首页 › game mode翻译 › 多智能体强化学习环境PettingZoo文档详解（一）

多智能体强化学习环境PettingZoo文档详解（一）

2023-03-28 02:45| 来源: 网络整理| 查看: 265

原文档链接：https://gymnasium.farama.org/源代码链接：https://github.com/Farama-Foundation/PettingZoo

▶观前提醒：

希望你事先学过《强化学习》课程，并使用过Open AI的gym。

在正式开始使用PettingZoo之前，请按照源代码链接里的《README.md》配置环境、安装所需软件包。

▶为什么要使用PettingZoo：

gym只适用于单智能体环境，其内置的环境全是单智能体的，若要用gym创建自定义多智能体环境，则非常麻烦。

PettingZoo专注于多智能体环境，其内置的环境全是多智能体的，用PettingZoo创建自定义多智能体环境非常方便。

▶本文说明：

本文是我对PettingZoo官方文档的翻译和解析，并结合个人实操经验写的笔记。

▶翻译说明：

agent翻译为 “智能体”（虽然也有“个体”之意）。

Gymnasium不翻译为“体育馆”，而是直呼其名。它是OpenAI gym的最新版本，而且是改了名的那个软件包（用import导入软件包时，它不叫gym，而叫gymnasium）

action翻译为“行动”或“动作”。指的是智能体的行为（动作）。

▶词语定义约定：

观测（observation）特指某个智能体的观测。也翻译为“观察”。

状态（state）指环境状态，即整个环境的所有状态。

▶让我们开始吧！◀

PettingZoo is a Python library for conducting research in multi-agent reinforcement learningPettingZoo是一个Python库，用于研究多智能体强化学习Environments can be interacted with in a manner very similar to Gymnasium:环境可以以与Gymnasium非常相似的方式进行交互： from pettingzoo.butterfly import knights_archers_zombies_v10 env = knights_archers_zombies_v10.env() env.reset() for agent in env.agent_iter(): observation, reward, termination, truncation, info = env.last() action = policy(observation, agent) env.step(action)1 基本用法（Basic Usage）1.1 初始化环境（Initializing Environments）Using environments in PettingZoo is very similar to using them in Gymnasium. You initialize an environment via:在PettingZoo中使用环境与在Gymnasium中使用环境非常相似。您可以通过以下方式初始化环境：from pettingzoo.butterfly import pistonball_v6 env = pistonball_v6.env()Environments are generally highly configurable via arguments at creation, i.e.:环境通常可以通过创建时的参数进行高度配置，即：cooperative_pong.env(ball_speed=18, left_paddle_speed=25, right_paddle_speed=25, is_cake_paddle=True, max_cycles=900, bounce_randomness=False)1.2 与环境交互（Interacting With Environments）Environments can be interacted with using a similar interface to Gymnasium:可以使用与Gymnasium类似的界面与环境进行交互：env.reset() for agent in env.agent_iter(): observation, reward, termination, truncation, info = env.last() action = policy(observation, agent) env.step(action)

▶这里的agent实质上是该智能体的名字（是一个字符串）。

The commonly used methods are:常用的方法有：agent_iter(max_iter=2**63)returns an iterator that yields the current agent of the environment. It terminates when all agents in the environment are done or when max_iter(steps have been executed).返回生成环境当前智能体的迭代器。当环境中的所有智能体都完成（指的是死亡）或已执行步骤达到最大值max_iter时，迭代器终止迭代。

▶亲测：实际使用时，迭代器终止迭代，当且仅当智能体列表（下文的agents，实为智能体名字的列表）为空。

所以这也就意味着，编写自定义环境时，若我希望游戏结束（不论是智能体全部死亡，还是执行的时间步达到最大值），那么就要清空智能体列表。

事实上，下文所述的“截断”和“终止”本身并不能让迭代器终止迭代。编写自定义环境时，若希望“截断”和“终止”能影响到迭代器，进而实现游戏正常结束，则在step()中要调用API的self._was_dead_step(action)函数。

last(observe=True)returns observation, reward, done, and info for the agent currently able to act. The returned reward is the cumulative reward that the agent has received since it last acted. If observe is set to False, the observation will not be computed, and None will be returned in its place. Note that a single agent being done does not imply the environment is done.返回当前能够执行操作的智能体的观测、奖励、完成和信息。返回的奖励是智能体自上次行动以来收到的累计奖励。如果observe设置为False时，将不计算观测值，并在其位置返回None。请注意，执行单个智能体并不意味着环境已完成。

▶相当于gym单智能体环境的step()。但这里的last()并不输入动作，而仅仅返回观测、奖励、完成和信息（observation, reward, done, info）。

reset()resets the environment and sets it up for use when called the first time.重置环境并将其设置为首次调用时使用。step(action)takes and executes the action of the agent in the environment, automatically switches control to the next agent.采取并执行环境中代理的操作，自动将控制切换到下一个代理。

▶问：示例代码里只有一个for循环，而且是仅针对于智能体迭代器（agent_iter）的。这样如何能执行时间步的循环？

答：智能体迭代器会不断地循环迭代，从第0个智能体迭代到第n个智能体，然后又会从第0个智能体开始迭代。一直到智能体列表为空。

控制时间步的循环，关键在于step()这个函数，它跟gym单智能体环境的step()不太一样。这里的step()仅仅是输入动作，而不会返回观测、奖励、完成和信息（observation, reward, done, info）。

就相当于：gym的step()这个功能，在PettingZoo中被拆解为两部分——last()和step()。

我实际编写这个函数时，既可以在每个智能体输入动作后实时更新状态（我指的是环境的状态state）（就像下棋，有先手、后手），也可以在所有智能体输入一轮动作后（当做一个完整的时间步结束）再更新状态（就像玩石头剪刀布）【这样的话建议使用下文的Parallel API（并行API）。这里给出的代码是“智能体环境周期（AEC）”（见下文）的，要做并行会有点麻烦】。

“智能体环境周期（AEC）”也可以做到并行！通过“if self._agent_selector.is_last():”（智能体迭代器迭代到最后一个智能体时，则表明一个时间步的结束），来控制一个完整的时间步。

1.3 其他环境API（Additional Environment API）PettingZoo models games as Agent Environment Cycle (AEC) games, and thus can support any game multi-agent RL can consider, allowing for fantastically weird cases. Because of this, our API includes lower level functions and attributes that you probably won’t need but are very important when you do. Their functionality is used to implement the high-level functions above though, so including them is just a matter of code factoring.PettingZoo将游戏建模为智能体环境周期（AEC）游戏，因此可以支持多智能体RL可以考虑的任何游戏，允许出现非常奇怪的情况。正因为如此，我们的API包含了较低级别的函数和属性，您可能不需要它们，但它们在使用时非常重要。它们的功能用于实现上述高级函数，因此包含它们只是一个代码分解问题。agentsA list of the names of all current agents, typically integers. These may be changed as an environment progresses (i.e. agents can be added or removed).所有当前智能体的名称列表，通常为整数。这些可能会随着环境的发展而改变（即可以添加或删除智能体）。

▶注意：agents这个列表只是保存智能体的名字，而非智能体本身（我指的是用class类定义的“智能体”）。甚至都不是智能体的ID（在智能体列表中的下标）。

num_agentsThe length of the agents list.智能体列表的长度

▶这里的智能体列表（agents list）指的是上述的agents

agent_selectionan attribute of the environment corresponding to the currently selected agent that an action can be taken for.与当前所选智能体相对应的环境属性，可以对其执行操作。

❓没看懂

observation_space(agent) a function that retrieves the observation space for a particular agent. This space should never change for a particular agent ID.检索特定智能体的观测空间的函数。对于特定的智能体ID，此空间不应更改。action_space(agent) a function that retrieves the action space for a particular agent. This space should never change for a particular agent ID.检索特定智能体的操作空间的函数。对于特定的代理ID，此空间不应更改。terminationsA dict of the termination state of every current agent at the time called, keyed by name. last() accesses this attribute. Note that agents can be added or removed from this dict. The returned dict looks like:每个当前调用的智能体的终止状态的dict（字典），由名称键入。last()访问此属性。请注意，可以从此dict中添加或删除智能体。返回的dict如下所示：terminations = {0:[first agent's termination state], 1:[second agent's termination state] ... n-1:[nth agent's termination state]}

▶注意：terminations这个字典的键是智能体的名字，值是True或False。

▶我理解的terminations的含义是：当某智能体死亡时，就应该把它对应的值设置为True。

truncations A dict of the truncation state of every current agent at the time called, keyed by name. last() accesses this attribute. Note that agents can be added or removed from this dict. The returned dict looks like:每个当前调用的智能体的截断状态的dict，由名称键入。last()访问此属性。请注意，可以从此dict中添加或删除代理。返回的dict如下所示：truncations = {0:[first agent's truncation state], 1:[second agent's truncation state] ... n-1:[nth agent's truncation state]}

▶问：“截断”和“终止”状态有啥区别？

答：“截断”（truncations）是实际执行的时间步大于设定的最大值时（比如游戏时间结束），将其设为True；“终止”（terminations）是当某智能体死亡时，就把它对应的值设置为True。

实操时，我发现，如果某智能体“截断”或“终止”的值为真，而输入的动作不为None时，则会报错。这是因为step()里使用了self._was_dead_step(action)函数。

事实上，“截断”和“终止”本身并不能让迭代器终止迭代。编写自定义环境时，若希望“截断”和“终止”能影响到迭代器，进而实现游戏正常结束，则在step()中要调用API的self._was_dead_step(action)函数。

infos A dict of info for each current agent, keyed by name. Each agent’s info is also a dict. Note that agents can be added or removed from this attribute. last() accesses this attribute. The returned dict looks like:每个当前智能体的信息字典，由名称键入。每个智能体的信息也是一个字典。请注意，可以在此属性中添加或删除智能体。last()访问此属性。返回的dict看起来像：infos = {0:[first agent's info], 1:[second agent's info] ... n-1:[nth agent's info]}

▶其实这个infos跟gym的step()返回的infos作用差不多，都是用于强化学习调整参数。对于编写自定义环境而言，用处不大。

observe(agent)Returns the observation an agent currently can make. last() calls this function.返回智能体当前可以进行的观测。last()调用此函数。rewards A dict of the rewards of every current agent at the time called, keyed by name. Rewards the instantaneous reward generated after the last step. Note that agents can be added or removed from this attribute. last() does not directly access this attribute, rather the returned reward is stored in an internal variable. The rewards structure looks like:每个当前调用的智能体的奖励信息，以姓名键入。奖励最后一步后产生的即时奖励。请注意，可以在此属性中添加或删除代理。last()不直接访问该属性，而是将返回的奖励存储在内部变量中。奖励结构如下：{0:[first agent's reward], 1:[second agent's reward] ... n-1:[nth agent's reward]}

▶问：所以last() “将返回的奖励存储在内部变量中”指的是哪个内部变量？

答：在编写自定义环境时，自己声明一个self.rewards变量，用来存储奖励。

seed(seed=None)Reseeds the environment. reset() must be called after seed(), and before step().重置环境种子。reset()必须在seed()之后和step()之前调用。

▶调用顺序为：seed() → reset() → step()

render() Returns a rendered frame from the environment using render mode specified at initialization. In the case render mode is 'rgb_array', returns a numpy array, while with 'ansi' returns the strings printed. There is no need to call render() with human mode.使用初始化时指定的渲染模式从环境中返回渲染帧。在这种情况下，渲染模式是'rgb_array'，返回numpy数组，而使用'ansi'返回打印的字符串。没有必要调用human模式的render()。

▶这意思应该是：如果初始化时指定的渲染模式是human模式，那么就没必要额外调用render()。因为这种情况下，就算不调用render()，也能正常显示（人类模式的）渲染窗口。其实这个在编写自定义环境时要自己写在step()函数里，比如：

if self.render_mode == "human": self.render()

close()Closes the rendering window.关闭渲染窗口。

▶问：确切地说，是仅关闭渲染窗口，还是关闭游戏环境env呢？

答：官方代码注释表明“关闭应释放任何图形显示、子流程、网络连接”。事实上，在编写自定义环境时，如果没有显示图形窗口、子流程、网络连接，那么这个函数里面可以直接写一个pass，也就是在调用环境后没必要特意“关闭游戏环境env”。

1.4 可选API组件（Optional API Components）While not required by the base API, most downstream wrappers and utilities depend on the following attributes and methods, and they should be added to new environments except in special circumstances where adding one or more is not possible.虽然基本API不需要，但大多数下游包装器和实用程序都依赖于以下属性和方法，应该将它们添加到新环境中，除非在特殊情况下无法添加一个或多个。possible_agents A list of all possible_agents the environment could generate. Equivalent to the list of agents in the observation and action spaces. This cannot be changed through play or resetting.环境可能生成的所有可能的智能体的列表。相当于观察和行动空间中的智能体列表。这不能通过播放或重置进行更改。

▶问：什么意思？有啥用？

答：指的应该是允许的（合法的）智能体的名字。环境初始化时生成的实际的agents（智能体列表）只能从中选取。

max_num_agentsThe length of the possible_agents list.possible_agents列表的长度。observation_spaces A dict of the observation spaces of every agent, keyed by name. This cannot be changed through play or resetting.每个智能体的观察空间的dict，由名称键入。这不能通过播放或重置进行更改。action_spaces A dict of the action spaces of every agent, keyed by name. This cannot be changed through play or resetting.每个智能体的动作空间的dict，由名称键入。这不能通过播放或重置进行更改。

▶以上的“play or resetting”（“播放或重置”），“播放”指的是：在游戏进行过程中；“重置”指的是：对游戏重置（reset()函数）。

state()Returns a global observation of the current state of the environment. Not all environments will support this feature.返回对环境当前状态的全局观测。并非所有环境都支持此功能。state_space The space of a global observation of the environment. Not all environments will support this feature.全局环境观测的空间。并非所有环境都支持此功能。1.5 值得注意的习惯用法（Notable Idioms） 1.5.1 检查整个环境是否完成（Checking if the entire environment is done）When an agent is terminated or truncated, it’s removed from agents, so when the environments done agents will be an empty list. This means not env.agents is a simple condition for the environment being done.当智能体被终止或截断时，它将从agents中移除，所以当环境完成时agents将是一个空列表。这意味着not env.agents是一个简单的环境条件。

▶意思是可以用not env.agents来判断游戏是否结束

▶实操时，“当智能体被终止或截断时，它将从agents中移除”这个功能需要用self._was_dead_step(action)函数。

1.5.2 拆解一个环境（Unwrapping an environment）If you have a wrapped environment, and you want to get the unwrapped environment underneath all the layers of wrappers (so that you can manually call a function or change some underlying aspect of the environment), you can use the .unwrapped attribute. If the environment is already a base environment, the .unwrapped attribute will just return itself.如果您有一个已包装的环境，并且希望在所有包装层下面获得未包装的环境（以便您可以手动调用函数或更改环境的某些底层方面），那么可以使用.unwrapped属性。如果环境已经是基本环境，则.unwrapped属性将只返回自身。base_env = knights_archers_zombies_v10.env().unwrapped

▶问：为啥要包装环境？如何包装环境？

答：不重要啦。有时要把并行环境改成串行环境，就需要包装。

1.5.3 智能体的数量可变（死亡）（Variable Numbers of Agents (Death) ）Agents can die and generate during the course of an environment. If an agent dies, then its entry in the terminated dictionary is set to True, it become the next selected agent (or after another agent that is also terminated or truncated), and the action it takes is required to be None. After this vacuous step is taken, the agent will be removed from agents and other changeable attributes. Agent generation can just be done with appending it to agents and the other changeable attributes (with it already being in the possible agents and action/observation spaces), and transitioning to it at some point with agent_iter.在环境过程中，智能体可能会死亡并生成。如果智能体死亡，则其在terminated字典设置为True，它将成为下一个选定的智能体（或在另一个也被终止或截断的智能体之后），它所采取的行动（action）必须是None。执行此空步骤后，这个智能体将从agents和其他可变属性中删除。智能体生成只需将其附加到agents以及其他可变属性（它已经存在于可能的智能体列表和行动/观察空间中），并在某个时候使用agent_iter转换到它。

▶这里规定，智能体死亡就应该在terminated字典设置为True，并且它所采取的行动（action）必须是None。

我试过，若智能体死亡，它传给环境的action 不为None时，就会报错。这是因为self._was_dead_step(action)函数。

▶最要命的是“执行此空步骤后，这个智能体将从agents和其他可变属性中删除。”

如果我的游戏规定死亡后还可以复活呢？这样直接删除，就可能会导致游戏直接结束——比如正好有一个时间步，所有的智能体都死亡了，但是都正在等待复活，此时我不希望游戏结束。

所以我的解决方案是：若游戏规定能复活，则智能体死亡后不要在terminated字典设置为True。

1.5.4 作为智能体的环境（Environment as an Agent）In certain cases, separating agent from environment actions is helpful for studying. This can be done by treating the environment as an agent. We encourage calling the environment actor env in env.agents, and having it take None as an action.在某些情况下，将智能体与环境作用分离有助于研究。这可以通过将环境视为一种智能体来实现。我们鼓励调用环境行动者env在env.agents中，将None作为一项行动。1.6 原始环境（Raw Environments）Environments are by default wrapped in a handful of lightweight wrappers that handle error messages and ensure reasonable behavior given incorrect usage (i.e. playing illegal moves or stepping before resetting). However, these add a very small amount of overhead. If you want to create an environment without them, you can do so by using the raw_env() constructor contained within each module:默认情况下，环境被包装在少数轻量级包装器中，这些包装器处理错误消息，并确保在不正确的使用情况下的合理行为（即在重置之前进行非法移动或单步移动）。然而，这些增加了非常少的开销。如果要创建没有它们的环境，可以使用raw_env()——在每个模块中包含的构造函数：env = knights_archers_zombies_v10.raw_env()

2 环境创建（Environment Creation）This documentation overviews creating new environments and relevant useful wrappers, utilities and tests included in PettingZoo designed for the creation of new environments.本文档概述了创建新环境以及PettingZoo中为创建新环境而设计的相关有用包装器、实用程序和测试。2.1 自定义环境示例（Example Custom Environment）This is a carefully commented version of the PettingZoo rock paper scissors environment.这是一个经过仔细注释的PettingZoo石头剪刀布环境版本。

▶环境代码写在 Example_Custom_Environment.py

▶使用该环境的代码为t_ece.py

▶详见《多智能体游戏环境PettingZoo_自定义环境示例_代码详解》：

输出如下：

2.2 自定义并行环境示例（Example Custom Parallel Environment）

▶环境代码写在 Example_Parallel_Environment.py

▶使用该环境的代码为t_epe.py

▶详见《多智能体游戏环境PettingZoo_自定义【并行】环境示例_代码详解》：

输出如下：

2.3 使用包装器（Using Wrappers）A wrapper is an environment transformation that takes in an environment as input, and outputs a new environment that is similar to the input environment, but with some transformation or validation applied. PettingZoo provides wrappers to convert environments back and forth between the AEC API and the Parallel API and a set of simple utility wrappers which provide input validation and other convenient reusable logic. PettingZoo also includes wrappers via the SuperSuit companion package (pip install supersuit).包装器是一种环境转换器，它接受环境作为输入，并输出与输入环境类似的新环境，但应用了一些转换或验证。PettingZoo提供用于转换环境的包装器在AEC API和并行API之间来回切换，以及一组简单的实用包装器提供输入验证和其他方便的可重用逻辑。PettingZoo还包括使用SuperSuit配套软件包的包装器(pip install supersuit ).

▶SuperSuit目前用不上。不重要

2.4 开发者实用工具（Developer Utils）The utils directory contains a few functions which are helpful for debugging environments. These are documented in the API docs.utils目录包含一些有助于调试环境的函数。这些都记录在API文档中。The utils directory also contain some classes which are only helpful for developing new environments. These are documented below.utils目录还包含一些仅对开发新环境有帮助的类。这些记录如下。 2.4.1智能体选择器（Agent selector）The agent_selector class steps through agents in a cycle这个agent_selector类在一个循环中逐步遍历智能体It can be used as follows to cycle through the list of agents:它可以按如下方式循环使用智能体列表：from pettingzoo.utils import agent_selector agents = ["agent_1", "agent_2", "agent_3"] selector = agent_selector(agents) agent_selection = selector.reset() # agent_selection will be "agent_1" for i in range(100): agent_selection = selector.next() # will select "agent_2", "agent_3", "agent_1", "agent_2", "agent_3", ..."

▶亲测，agent_selector类会不断地循环遍历智能体。

2.4.2已弃用的模块（Deprecated Module）The DeprecatedModule is used in PettingZoo to help guide the user away from old obsolete environment versions and toward new ones. If you wish to create a similar versioning system, this may be helpful.PettingZoo中使用了DeprecatedModule，以帮助引导用户远离旧的过时环境版本，转而使用新的环境版本。如果您希望创建类似的版本控制系统，这可能会有所帮助。For example, when the user tries to import the knights_archers_zombies_v0 environment, they import the following variable (defined in pettingzoo/butterfly/__init__.py):例如，当用户尝试导入knights_archers_zombies_v0环境，它们导入以下变量（在pettingzoo/butterfly/__init__.py中定义 ):from pettingzoo.utils.deprecated_module import DeprecatedModule knights_archers_zombies_v0 = DeprecatedModule("knights_archers_zombies", "v0", "v10")This declaration tells the user that knights_archers_zombies_v0 is deprecated and knights_archers_zombies_v10 should be used instead. In particular, it gives the following error:此声明告诉用户knights_archers_zombies_v0已弃用，并且knights_archers_zombies_v10应改为使用。特别地，它给出了以下错误：from pettingzoo.butterfly import knights_archers_zombies_v0 knights_archers_zombies_v0.env() # pettingzoo.utils.deprecated_module.DeprecatedEnv: knights_archers_zombies_v0 is now deprecated, use knights_archers_zombies_v10 instead

3 测试环境（Testing Environments）PettingZoo has a number of compliance tests for environments through. If you are adding a new environment, we encourage you to run these tests on your own environment.PettingZoo通过进行了许多环境合规性测试。如果您要添加新环境，我们鼓励您在自己的环境中运行这些测试。3.1 API测试（API Test）PettingZoo’s API has a number of features and requirements. To make sure your environment is consistent with the API, we have the api_test. Below is an example:PettingZoo的API具有许多功能和要求。为了确保您的环境与API一致，我们使用了API_test。下面是一个示例：from pettingzoo.test import api_test from pettingzoo.butterfly import pistonball_v6 env = pistonball_v6.env() api_test(env, num_cycles=1000, verbose_progress=False)As you can tell, you simply pass an environment to the test. The test will assert or give some other error on an API issue, and will return normally if it passes.正如您所知，您只需将环境传递给测试即可。测试将断言或给出有关API问题的其他错误，如果通过，测试将正常返回。The optional arguments are:可选参数包括：num_cycles runs the environment for that many cycles and checks that the output is consistent with the API.将环境运行这么多周期，并检查输出是否与API一致。verbose_progress Prints out messages to indicate partial completion of the test. Useful for debugging environments.打印消息以指示测试部分完成。对调试环境很有用。

▶这个API测试的作用就是：我自己写好了一个环境（自定义环境）之后，用这个api_test()函数来测试我的环境是否符合规范。

3.2 并行API测试（Parallel API Test）This is an analogous version of the API test, but for parallel environments. You can use this test like:这是API测试的类似版本，但仅适用于并行环境。您可以这样使用此测试：from pettingzoo.test import parallel_api_test from pettingzoo.butterfly import pistonball_v6 env = pistonball_v6.parallel_env() parallel_api_test(env, num_cycles=1000)

▶如果我写的环境是并行环境，那么就要用这个来测试。

3.3 种子测试（Seed Test）To have a properly reproducible environment that utilizes randomness, you need to be able to make it deterministic during evaluation by setting a seed for the random number generator that defines the random behavior. The seed test checks that calling the seed() method with a constant actually makes the environment deterministic.要拥有一个利用随机性的可适当复制的环境，您需要能够通过为定义随机行为的随机数生成器设置种子，使其在评估期间具有确定性。种子测试检查调用带有常数的方法seed()，这能使环境具有确定性。The seed test takes in a function that creates a pettingzoo environment. For example种子测试采用一个创建pettingzoo环境的函数。例如from pettingzoo.test import seed_test, parallel_seed_test from pettingzoo.butterfly import pistonball_v6 env_fn = pistonball_v6.env seed_test(env_fn, num_cycles=10, test_kept_state=True) # or for parallel environments parallel_env_fn = pistonball_v6.parallel_env parallel_seed_test(parallel_env_fn, num_cycles=10, test_kept_state=True)Internally, there are two separate tests.在内部，有两个单独的测试：1.Do two separate environments give the same result after the environment is seeded?1.在环境设置种子后，两个单独的环境是否会产生相同的结果？2.Does a single environment give the same result after seed() then reset() is called?2.调用seed()然后调用reset()，单个环境是否会给出相同的结果？

The first optional argument, num_cycles , indicates how long the environment will be run to check for determinism. Some environments only fail the test long after initialization.第一个可选参数，num_cycles，指示环境将运行多长时间以检查确定性。有些环境在初始化后很长时间内都无法通过测试。The second optional argument, test_kept_state allows the user to disable the second test. Some physics based environments fail this test due to barely detectable differences due to caches, etc, which are not important enough to matter.第二个可选参数，test_kept_state允许用户禁用第二个测试。一些基于物理的环境由于缓存等原因几乎无法检测到差异而无法通过此测试，这些缓存并不重要。

❓不太清楚。不太重要。

3.4 最大循环测试（Max Cycles Test）The max cycles test tests that the max_cycles environment argument exists and the resulting environment actually runs for the correct number of cycles. If your environment does not take a max_cycles argument, you should not run this test. The reason this test exists is that many off-by-one errors are possible when implementing max_cycles. An example test usage looks like:最大循环测试用于测试max_cycles环境参数存在，结果环境实际运行的周期数正确。如果您的环境不需要max_cycles参数，您不应该运行此测试。该测试存在的原因是，在实施max_cycles时可能会出现许多非同步错误。测试用法示例如下：from pettingzoo.test import max_cycles_test from pettingzoo.butterfly import pistonball_v6 env = pistonball_v6.env() max_cycles_test(env)3.5 渲染测试（Render Test）The render test checks that rendering 1) does not crash and 2) produces output of the correct type when given a mode (only supports 'human' , 'ansi', and 'rgb_array' modes).渲染测试检查渲染：1）不会崩溃；2）在给定模式（仅支持'human'、'ansi'和'rgb_array'模式）时，生成正确类型的输出。from pettingzoo.test import render_test from pettingzoo.butterfly import pistonball_v6 env = pistonball_v6.env() render_test(env)The render test method takes in an optional argument custom_tests that allows for additional tests in non-standard modes.渲染测试方法接受可选参数custom_tests，这允许在非标准模式下进行额外的测试。custom_tests = { "svg": lambda render_result: return isinstance(render_result, str) } render_test(env, custom_tests=custom_tests)

▶这里所谓的“非标准模式”指的是除了'human'、'ansi'和'rgb_array'模式之外的模式。

3.6 性能基准测试（Performance Benchmark Test）To make sure we do not have performance regressions, we have the performance benchmark test. This test simply prints out the number of steps and cycles that the environment takes in 5 seconds. This test requires manual inspection of its outputs:为了确保我们没有性能退化，我们进行了性能基准测试。该测试只打印出环境在5秒内进行的步骤和周期数。此测试需要手动检查其输出：from pettingzoo.test import performance_benchmark from pettingzoo.butterfly import pistonball_v6 env = pistonball_v6.env() performance_benchmark(env)

▶指的应该是5秒内能执行的时间步。数值越大则说明性能越好。

▶我实测发现，输出的是每秒的时间步和循环数：

3.7 保存观测测试（Save Observation Test）The save observation test is to visually inspect the observations of games with graphical observations to make sure they are what is intended. We have found that observations are a huge source of bugs in environments, so it is good to manually check them when possible. This test just tries to save the observations of all the agents. If it fails, then it just prints a warning. The output needs to be visually inspected for correctness.保存观测测试是用图形观测直观地检查游戏的观测结果，以确保它们符合预期。我们发现，观测是环境中大量错误的来源，因此最好尽可能手动检查它们。这个测试只是试图保存所有智能体的观测结果。如果失败，则只打印警告。需要目视检查输出的正确性。from pettingzoo.test import test_save_obs from pettingzoo.butterfly import pistonball_v6 env = pistonball_v6.env() test_save_obs(env)

▶实测发现，只是保存了一个时间步（一帧），每张图片就是单个智能体的观测。

未完待续~

本文只是原文档的第一部分——介绍部分。之后我还会研究剩下的几个部分——API、环境、教程。

【本文地址】

多智能体强化学习环境PettingZoo文档详解（一）

多智能体强化学习环境PettingZoo文档详解（一）

今日新闻

推荐新闻